Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Union query result should work in JDBC format #2757

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

LantaoJin
Copy link
Member

Description

Union query only works with csv format, but without the '?format=csv' query parameter at the end of the POST call will throw NullPointerException. Reproduce:

PUT index001/_doc/1
{
  "query_id" : "1"
}

PUT index002/_doc/1
{
  "query_id" : "1"
}

POST /_plugins/_sql
{
  "query": "select query_id from index001 UNION select query_id from index002"
}

Currently, the UNION statement is implemented in legacy engine(v1). It's a bug in legacy engine(v1).
This PR fixes this issue in legacy engine.

Issues Resolved

Resolve #2540

Check List

  • New functionality includes testing.
    • All tests pass, including unit test, integration test and doctest
  • [ ] New functionality has been documented.
    • [ ] New functionality has javadoc added
    • [ ] New functionality has user manual doc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

String query =
String.format(
Locale.ROOT,
"SELECT firstname, lastname FROM %s LIMIT 3 "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does UNION works for query with aggregation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, UNION with Agg in legacy engine has bugs by my testing whatever format set. This fixing doesn't cover all existed bugs. You can try below queries.

POST _plugins/_sql?format=csv
{
  "query": "select customer_gender,count(type) from opensearch_dashboards_sample_data_ecommerce group by customer_gender UNION select customer_gender,count(type) from opensearch_dashboards_sample_data_ecommerce group by customer_gender"
}
POST _plugins/_sql?format=csv
{
  "query": "select count(type) from opensearch_dashboards_sample_data_ecommerce group by customer_gender UNION select count(type) from opensearch_dashboards_sample_data_ecommerce group by customer_gender"
}

May be we could open another ticket to fix the problems in Union with Agg? Or migrating to v2?

Copy link
Member Author

@LantaoJin LantaoJin Jun 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And seems current implementation doesn't distinguish UNION and UNION ALL. A better way is to re-implement UNION in v2, PPL also requests multi-search implementation.
Again, this fixing is just for addressing the NPE problem of UNION query in JDBC format (such a common use case).

@vamsimanohar
Copy link
Member

Can you add little more details on where exactly is the bug?

@@ -105,7 +105,7 @@ private List<String> getFormatsForColumn(String columnName) {
private Set<String> getDateColumns(List<Schema.Column> columns) {
return columns.stream()
.filter(column -> column.getType().equals(Schema.Type.DATE.nameLowerCase()))
.map(Schema.Column::getName)
.map(Schema.Column::getIdentifier)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this related to the current issue?

Copy link
Member Author

@LantaoJin LantaoJin Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catching! This change should be reverted. What change we need for alias case is just this diff: https://github.com/opensearch-project/sql/pull/2757/files#diff-943465139286a5010c19cab380f141e3c9bc26fb02652bedbc6a7bbec5c98cc3R46

@@ -22,7 +23,7 @@ public class MultiQueryAction extends QueryAction {
private MultiQuerySelect multiQuerySelect;

public MultiQueryAction(Client client, MultiQuerySelect multiSelect) {
super(client, null);
super(client, new Union(multiSelect.getFirstSelect(), multiSelect.getSecondSelect()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the root cause of the issue?

Copy link
Member Author

@LantaoJin LantaoJin Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, plus missing call loadFromEsState for both side of Union.

@LantaoJin
Copy link
Member Author

Can you add little more details on where exactly is the bug?

The NPE issue caused by calling loadFromEsState for both SELECT clauses of UNION children is missing. The fixing is adding https://github.com/opensearch-project/sql/pull/2757/files#diff-744a18b0fa02b144d9aaf91192a6594175eb0bafdca6e01adb841093c36854f5R105-R108 and https://github.com/opensearch-project/sql/pull/2757/files#diff-d81d74cf75806ee8450a8a706d7e71d21787d2cabfcb2c4f2c0e83176e711940R26

@@ -180,8 +185,12 @@ private void loadFromEsState(Query query) {
Map<String, FieldMappingMetadata> typeMappings = mappings.get(indexName);

this.indexName = this.indexName == null ? indexName : (this.indexName + "|" + indexName);
this.columns.addAll(
renameColumnWithTableAlias(query, populateColumns(query, fieldNames, typeMappings)));
if (isPartOfUnion(query) && !this.columns.isEmpty()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between isUnionQuery and isPartOfUnion? Do we need the new partOfUnion field?

Copy link
Member Author

@LantaoJin LantaoJin Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isUnionQuery checks if the query type is Union, just similar to isJoinQuery does, checking if the type is Join.
The children type of Join is TableOnJoinSelect, a special type rather than a normal Select. But The children type of Union is a normal Select, unless we set a new type for Union's children, we can't identify a Select is a normal one or a child of Union in loadFromEsState, that's the reason I add partOfUnion in Select. Or add a new class class UnionSubSelect extends Select. Which one would be better?
https://github.com/opensearch-project/sql/pull/2757/files#diff-0b55e630b732c43e3c8ea8038eb068f21a3c46bcd1744c98be4cc94bad799b84R50

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Query with UNION operator fails with NullPointerException
4 participants